tmp <- 10
tmp1 <- tmp * 242024-12-12
In general:
Covering this session’s topics
Easy to write
familiarity with code editor, libraries
Easy to understand
structured, with consistent variable names, commented.
Easy to debug
clear naming, DRY, tests.
Easy to run
🏎️ (profiling, C++, using “optimized” code).
names should be consistent, descriptive, lower case, readable.
For which snippet is it easier to guess the context?
Functions are first-class citizens in R
Rethink for,while loops; “apply” instead
“To become significantly more reliable, code must become more transparent. In particular, nested conditions and loops must be viewed with great suspicion. Complicated control flows confuse programmers. Messy code often hides bugs.”
— Bjarne Stroustrup
Say you want to extract the \(R^2\) from three linear models with different predictors (or formulae).
parallel packageYou can imagine wanting to run each of the apply/for loop iterations in parallel.
data.table is a package that extends the data.frame class.dplyr for large datasets.dt[i, j, by]dt …irisiris: .VARSprofvis package is a good package to use for this purpose.library(profvis)
library(data.table)
n <- 4e5
cols <- 150
data <- as.data.frame(x = matrix(rnorm(n * cols, mean = 5), ncol = cols))
data <- cbind(id = paste0("g", seq_len(n)), data)
dataDF <- as.data.table(data)
numeric_vars <- setdiff(names(data), "id")
profvis({
means <- apply(data[, names(data) != "id"], 2, mean)
means <- colMeans(data[, names(data) != "id"])
means <- lapply(data[, names(data) != "id"], mean)
means <- vapply(data[, names(data) != "id"], mean, numeric(1))
means <- matrixStats::colMeans2(as.matrix(data[, names(data) != "id"]))
means <- dataDF[, lapply(.SD, mean), .SDcols = numeric_vars]
})